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Abstract — Motivated by the need for formal guarantees on 
the stability and safety of controllers for challenging robot 
control tasks, we present a control design procedure that 
explicitly seeks to maximize the size of an invariant "funnel" 
that leads to a predefined goal set. Our certificates of invariance 
are given in terms of sums of squares proofs of a set of 
appropriately defined Lyapunov inequalities. These certificates, 
together with our proposed polynomial controllers, can be 
efficiently obtained via semidefinite optimization. Our approach 
can handle time-varying dynamics resulting from tracking a 
given trajectory, input saturations (e.g. torque Hmits), and can 
be extended to deal with uncertainty in the dynamics and state. 
The resulting controllers can be used by space-filling feedback 
motion planning algorithms to fill up the space with significantly 
fewer trajectories. We demonstrate our approach on a severely 
torque limited underactuated double pendulum (Acrobot) and 
provide extensive simulation and hardware validation. 

I. Introduction 

Challenging robotic tasks such as walking, running and 
flying require control techniques that provide guarantees on 
the performance and safety of the nonlinear dynamics of the 
system. Much recent progress has been made in generating 
open-loop motion plans for high-dimensional kinematically 
and dynamically constrained systems. These methods include 
Rapidly Exploring Randomized Trees (RRTs) [13], [11] 
and trajectory optimization [4], and have been successfully 
applied for solving motion planning problems in a variety of 
domains [11], [12] . However, open loop motion plans alone 
are not sufficient to perform challenging control tasks such as 
robot locomotion; usually, one needs a stabilizing feedback 
controller to correct for deviations from the planned trajec- 
tory. Popular feedback control techniques include methods 
based on linearization such as the Linear Quadratic Regulator 
(LQR) [14] and partial feedback linearization [21]. While 
these approaches are relatively easy to implement, they are 
unable to directly reason about nonlinear dynamics and 
input saturations (e.g. torque limits). The resulting controllers 
can require a large degree of hand-tuning and importantly, 
provide no explicit guarantees about the performance of the 
nonlinear system. 

Another popular approach for feedback control is linear 
Model Predictive Control (MPC) [6]. While this method can 
deal with input saturations, it is again unable to provide 
guarantees on the stability/safety of the resulting controller 
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Fig. 1. The "Acrobot" used for hardware experiments, 
on the nonlinear system. Dynamic programming has also 
been used for controlling robots [3]. Although this approach 
does reason about the natural dynamics of the system and 
can provide guarantees of the safety of the resulting closed 
loop system, it requires some form of discretization of the 
state space and dynamics. Thus, discretization errors and 
the curse of dimensionality have prevented this method 
from being applied successfully on challenging control tasks. 
Differential dynamic programming [8] seeks to address some 
of these issues, but is local in nature and thus cannot provide 
guarantees on the full nonlinear system. 

In this paper, we provide an alternative approach that 
employs Lyapunov 's stability theory for designing controllers 
that explicitly reason about the nonlinear dynamics of the 
system and provide guarantees on the stability of the result- 
ing closed loop dynamics. The power of Lyapunov stability 
theorems stems from the fact that they turn questions about 
behavior of trajectories of dynamical systems into questions 
about positivity or nonnegativity of functions. For example, 
asymptotic stability of an equilibrium point of a continu- 
ous time dynamical system, x = /(x), is proven (roughly 
speaking), if one succeeds in finding a Lyapunov function 
V satisfying V{x) > and -V{x) = -{VV{x)J{x)) > 0; 
i.e., a scalar valued function that is positive and mono- 



tonically decreases along the trajectories of the dynamical 
system. Except for very simple systems (e.g. linear systems), 
the search for Lyapunov functions has traditionally been 
a daunting task. More recently, however, techniques from 
convex optimization and algorithmic algebra such as sums- 
of-squares programming [20] have emerged as a machinery 
for computing polynomial Lyapunov functions and have had 
a large impact on the controls community [7]. The sums-of- 
squares (SOS) approach relies on our ability to efficiently 
check if a polynomial can be expressed as a sum of squares 
of other polynomials. Since both of Lyapunov 's conditions 
are checks on positivity (or nonnegativity) of functions, 
one can search over parameterized families of polynomial 
Lyapunov functions which satisfy the stronger requirement of 
admitting sums of squares decompositions. This search can 
be cast as a semidefinite optimization program and solved 
efficiently using interior point methods [20] . Although there 
is in general a gap between nonnegative and sum of squares 
polynomials, the gap has shown to be small in practical 
applications that apply these techniques to Lyapunov stability 
theorems. A theoretical study of this gap has also recently 
appeared [2, Chap. 4]. 

While a large body of work in the controls literature 
has focused on leveraging these tools for the design and 
verification of controllers, the focus has almost exclusively 
been on controlling time-invariant systems to an equilib- 
rium point; see e.g. [10], [9]. While this is of practical 
importance in many areas of control engineering, it has 
limited applicability in robotics since most tasks involve 
controlling the system along a trajectory instead of to a fixed 
point. There has been recent work on computing regions of 
finite time invariance ("funnels") around trajectories using 
sums-of- squares programming, but this work has focused 
on computing funnels for a fixed time-varying controller 
[25]. In this paper, we seek to build on these results and 
use sums-of-squares programming for the design of time- 
varying controllers that maximize the size of the resulting 
"funnel"; i.e. maximize the size of the set of states that 
reach a pre-defined goal set. Our approach is able to directly 
handle the time-varying nature of the dynamics (resulting 
from following a trajectory), can obey limits on actuation 
(e.g. torque limits), and can be extended to handle scenarios 
in which there is uncertainty in the dynamics. 

We hope that our sums-of-squares based control design 
technique will have a large impact on control synthesis meth- 
ods that rely on the sequential composition of controllers 
(first introduced to the robotics community in [5]). More 
recently, the LQR-Trees algorithm [24] has been proposed 
for filling up a space with locally stabilizing controllers 
that are sequentially composed to drive a large set of initial 
conditions to a goal state. This has also been extended to an 
online planning framework where one does not have access 
to kinematic constraints (e.g. obstacles) till runtime and is 
also faced with uncertainty about the dynamics and state 
while performing a task [18]. Both of these frameworks make 
use of fixed controllers (e.g LQR, H-Infinity) that are not 
specifically designed to maximize the size of the funnels 



computed. Thus, by employing controllers that explicitly 
seek to maximize the size of the verified funnels, the space- 
filling algorithms will be able to achieve more with fewer 
planned trajectories. 

We demonstrate our approach through extensive hardware 
experiments on a torque limited underactuated double pen- 
dulum ("Acrobot") performing the classic "swing-up and 
balance" task [22]. To our knowledge, these experimental 
results provide the first hardware validation of sums-of- 
squares programming based "funnels". 



IL Time Invariant Controller Design 

In this section, we present our method for designing time- 
invariant controllers that stabilize a system to an equilibrium 
point. This approach is similar in many respects to previous 
work [10], [9]. However, we still present it here as it 
helps motivate the time-varying controller design section and 
differs from prior work in some details of implementation. 
We also use this method for balancing the Acrobot about the 



upright position in Section VI 



Given a polynomial control affine system, i = /(x) + 
g{x)u, in state variables x G and control input u G W^, our 
task is to find a controller, u{x), that stabilizes the system to 
a fixed point. Without loss of generality, we assume that the 
goal point is the origin. In order to make our search amenable 
to sums-of-squares programming, we restrict ourselves to 
searching over controllers that are polynomials in the state 
variables, i.e., u{x) is a polynomial of some fixed degree. 

A natural metric for the performance of the stabilizing 
controller is the size of the region of attraction of the 
resulting closed loop system. The region of attraction is the 
set of points that asymptotically converge to the origin. Thus, 
we will seek to design controllers that produce the "largest" 
region of attraction. If we can find a function y(x), with 
y(0) = 0, and a sub-level set. Bp = {x \ y(x) < p}, that 
satisfies: 

X G X ^ ^ y(jc) > 0, V{x) < 0, 

we can conclude that Bp is an inner approximation of the true 
region of attraction. This representation also yields a natural 
metric for the "largeness" of the resulting verified region 
of attraction. We aim to design controllers that maximize p 
subject to a normalization constraint on V{x). (If one does 
not normalize V{x), p can be made arbitrarily large simply 
by scaling the coefficients of V{x)). 

Denoting the closed loop system by fci{x^u{x)), we can 
compute: 

V{x) = ^= fcl{^,u{x)). 

Then, the following sums-of-squares program can be used 
to find a controller that maximizes the size of the verified 



region of attraction: 



maximize p 

P,L(x),V(x),m(x) 



(1) 



subject to V{x) SOS (2) 

-y(x)+L(jc)(y(x)-p) SOS (3) 

L(jc) SOS (4) 

VC£ej) = 1 (5) 

j 

Here, L{x) is a non-negative "multiplier" term and ej is the 
7-th standard basis vector for the state space R". It is easy to 
see that the above conditions are sufficient for establishing 
Bp as an inner estimate of the region of attraction for the 
system. When x e Bp, we have by definition that V{x) < p. 
Then, since L{x) is constrained to be non-negative, condition 
(3) implies that V{x) < o[^ Condition (5) is a normaliza- 
tion constraint on V{x) and is a linear constraint on the 
coefficients of V{x). Note that this normalization does not 
introduce conservativeness since if a valid Lyapunov function 
exists, one can always scale it to satisfy this normalization 
constraint. 

The above optimization program is not convex in general 
since it involves conditions that are bilinear in the decision 
variables. However, the conditions are linear in L{x) and 
u{x) for fixed V{x), and are linear in V{x) for fixed L{x) 
and u{x). Thus, we can efficiently perform the optimization 
by alternating between the two sets of decision variables, 
(L(x),w(x)) and V{x) and repeat until convergence in the 
following two steps: (1) Fix V{x) and search over u{x) and 
L(x), and (2) Fix u{x) and L{x) and search over V{x). In both 
steps, we can optimize p, albeit in slightly different ways. 
In Step (2), p appears linearly in the constraints (since L{x) 
is fixed) and thus we can optimize it directly in the SOS 
program. In Step (1), we can perform binary search over p 
in order to maximize it. Each iteration of the alternation is 
guaranteed to obtain an objective p* that is at least as good 
as the previous one since a solution to the previous iteration 
is also valid for the current one. Combined with the fact 
that the objective must be bounded above for any realistic 
problem with a bounded region of attraction, we conclude 
that the sequence of optimal values produced by the above 
alternation scheme converges. 

Our approach requires one to have a good initial guess 
for the Lyapunov function. For this, one can simply use a 
linear control technique like the Linear Quadratic Regulator 
that provides a candidate V{x) (and scale it to satisfy the 
normalization constraint). 

Finally, an important observation that will help us in 
designing time- varying controllers in Section |Ill| is that by 
eliminating the non-negativity constraint on L{x) in the above 
SOS program, one can design controllers that make Bp 
an invariant set instead of a region of attraction. Relaxing 



^SOS decompositions obtained form numerical solvers generically 
provide proofs of polynomial positivity as opposed to mere non-negativity 
(see the discussion in [1, p.41]). This is why we claim a strict inequality 
on V. 



constraint (4) has the effect of checking condition (3) only 
on the boundary of the set Bp. This is because condition 
(3) now implies that V{x) < when V{x) equals p. Thus, 
trajectories that start in the set remain in the set for all time. 

III. Time Varying Controller Design 

We now extend the ideas presented in Section |ll] in order 
to design controllers that stabilize systems along pre-planned 
trajectories. This approach builds off of the work presented in 
[25], which seeks to compute regions of finite time invariance 
("funnels") around trajectories for a fixed feedback controller. 
Here, our aim is to design time-varying controllers that 
maximize the size of the funnel, i.e., maximize the size of 
the set of states that are stabilized to a pre-defined goal set. 

More formally, let x = f{x)^g{x)u be the control system 
under consideration. Let XQ{t) : [0,r] ^ W be the nominal 
trajectory that we want the system to follow and uo{t) : 
[0, r] ^ be the nominal open-loop control input. (In this 
section, we do not focus on how one might obtain such 
nominal trajectories and associated control inputs. This is 



briefly discussed in Section [V^ . Defining new coordinates 
x = x — xo{t) and ii = u — uo{t), we can rewrite the dynamics 
in these variables as: 

X = x-xo{t) = f{xo{t) +x) +g(xo(0 +x)(t/o(0 + ^) -^o{t) 

Then, given a goal set 5/, we seek to compute inner estimates 
of the time-varying sets 5p(^) that satisfy the following 
invariance condition V^o ^ [O,^]- 



^(^o) e Bp^to) ^(0 ^ ^p(0 e [to,T]. 



(6) 



Letting ^p(r) =Bf, this implies that any point that starts off 
in the "funnel" defined by 5p(^) is driven to the goal set. Our 
task will be to design time- varying controllers that maximize 
the size of this funnel. 

Proceeding as in Section [ll| we describe the funnel as a 
time- varying sub-level set of a function V{x^t)\ 



Bp^,) = {x I V{x,t)<p{t)} 



and require: 



V{x,t)=p{t) =^ V{x,t)<p{t) 



(7) 



It is easy to see that this condition implies the invariance 
condition ([6]). Here, V{x^t) is computed as: 



V{x,t) 



dV{x,t) ^ , dV{x,t) 

-X -' 



dx dt 

In principle, we can parameterize our function V{x^t) as 
a polynomial in both t and x and check ^ Wt e [0,r]. 
However, as described in [25], this leads to expensive sums- 
of-squares programs. Instead, we can get large computational 
gains with little loss in accuracy by checking ^ at sample 
points in time tf G [0, T], / = 1 . . .A/^. As discussed in [25], for 
a fixed V{x,t) and dynamics (and under mild conditions on 
both), increasing the density of the sample points eventually 
recovers ([7]) G [0,r]. This allows us to check the answers 
we obtain from the sums-of- squares program below by 
sampling finely enough. 



Thus, we parameterize V{x^t) and u by polynomials Vi{x) 
and Ui{x) respectively at each sample point in time|^ Using 
L£iP(^i) as the cost function, we can write the following 
sums-of- squares program: 



Algorithm 1 Time- Varying Controller Design 



maximize 

p{ti),Li{x)yi{x),Ui{x 



N 

Ip(?,-) 

subject to Vi{x) SOS 

-Vi{x)^p{ti)- 



-Li{x){Vi{x)- 



(8) 
(9) 

-p{ti)) SOS 
(10) 

(11) 



Similar to the SOS program in Section |n| the Li{x) are 
"multiplier" terms that help to enforce the invariance con- 
dition (note that there is no sign constraint on these multi- 
pliers). Condition (U) is a normalization constraint, where 
yguessi^^t) is the candidate for V{x^t) that is used for 
initializing the alternation scheme outlined below. We use 
a piecewise linear parameterization of p{t) and can thus 



_ p(^/+i)-pfc) 



ti+\-ti 



. Similarly, we compute 



dV{x,t) 



dt 



compute p{ti) 

Vi+^(x)-Vi{x) 
ti+l-ti 

The above optimization program is again not convex in 
general since it involves conditions that are bilinear in the 
decision variables. However, the conditions are linear in Li{x) 
and Ui{x) for fixed Vi{x) and p{ti), and are linear in Vi{x) and 
p{ti) for fixed Li{x) and Ui{x). Thus, in principle we could 
use a similar bilinear alternation scheme as the one used for 
designing time-invariant controllers in Section |ll| However, 
in the first step of this alternation, it is no longer possible 
to do a bisection search on p{t) since it is parameterized 
with a different variable at each sample point in time 
(it is possible in principle to perform a bisection search 
over multiple variables simultaneously, but is prohibitively 
expensive computationally). We could simply make the first 
step a feasibility problem (instead of optimizing a cost 
function), but this prevents us from searching for a controller 
that explicitly seeks to maximize the desired cost function, 
Y4=iP{ti), since in the second step of the alternation, we 
do not search for a controller. We get around this issue by 
introducing a third step in the alternation, in which we fix 
Li{x) and Vi{x) and search for iii{x) and p{ti) and maximize 
TJiLiPiU)' The steps in the alternation are summarized in 
Algorithm [T] By a similar reasoning to the one provided in 
Section we can conclude that the sequence of optimal 
values produced by Algorithm [T] converges. 

Section V-A discusses how to initialize Vi{x) and p{ti) for 
Algorithm [1] 

IV. Incorporating Actuator Limits 

A. Approach 1 

An important advantage of our method is that it allows 
us to incorporate actuator limits into the control design 

^Throughout, a subscript i next to a function will denote that the 
function is parameterized only at sample points in time. The notation V{x^t) 
will denote that the function is defined continuously for all time in the 
specified interval. 



Initialize Vi{x) and p(ti), \fi=l...N 

Pprev{ti)=0, yi=l...N. 

converged = false; 
while -iconverged do 

STEP 1 : Solve feasibility problem by searching for 

Li{x) and Ui{x) and fixing Vi{x) and p{ti). 

STEP 2 : Maximize L£iP(^/) by searching for Ui{x) 

and p{ti), and fixing Li{x) and Vi{x). 

STEP 3 : Maximize E^LiP(^0 by searching for Vi{x) 

and p{ti), and fixing Li{x) and Ui{x). 

if < e then 



^f=l Pprev{ti) 

converged = true; 
end if 

Pprev{ti)=p{ti), yi=l...N. 

end while 



procedure. Although we examine the single-input case in 
this section, this framework is very easily extended to handle 
multiple inputs. 

Let the control law u{x) be mapped through the following 
control saturation function: 



s{u{x)) ■■ 



Umax 
^min 

^u{x) 



if U{X) > Uyy 
if U(X) < Uyy 

o.w. 



Then, in a manner similar to [24], a piecewise analysis of 
y{x^t) can be used to check the Lyapunov conditions are 
satisfied even when the control input saturates. Defining: 



dV(x,tY 

dx 
dV{x,t)^ 

dx 



dV{x,t) 

dt 



(12) 



{f{x)^g{x)u^ax)^^^^ (13) 



dt 



we must check the following conditions: 



U{X) < Uyy 
U{X) > Uyy 



< u{x) < Ur, 



,{x,t)<p{t) 
a{x,t)<p(t) 
=^ V{x,t)<p{t) 



(14) 
(15) 
(16) 



The SOS program in Section III can be modified to enforce 
these conditions with extra multipliers, Mu{x) (similar to 
[24]). The addition of the new multipliers means that we can 
no longer search for the controller in Step 1 of Algorithm 
[T] This is because searching for the new multipliers and the 
controller at the same time makes the problem bilinear in the 
decision variables. Thus, in Step 1, we only search for all 
the multipliers (with a fixed Ui{x), Vi{x) and p{ti)). In Step 
2, we hold Viix) and all the multipliers constant and search 
for Ui{x) and p{ti). In Step 3, we fix the controller and Li{x) 
and search for Vi{x), p{ti) and M^(x) (we can do this since 
the controller is fixed). 

B. Approach 2 

Although one can handle multiple inputs via the above 
method, the number of SOS conditions grows exponentially 



with the number of inputs (3^ conditions for V are needed 
in general to handle all possible combinations of input 
saturations). Thus, for systems with a large number of 
inputs, we propose an alternative formulation that avoids this 
exponential growth in the size of the SOS program at the cost 
of adding conservativeness to the size of the funnel. Given 
element- wise limits on the control vector u G of the form 
i^min.k < < Umax^k^k = 1 . . . m, wc Can ask to Satisfy: 

X e Bpi^f.) =^ Ufnin.k < Uk(x) < Umax.k^ V^i . . Jn 

This constraint implies that the applied control input remains 
within the specified bounds inside the verified funnel (a con- 
servative condition), and can be imposed in each of the three 
steps in Algorithm [T] with the addition of new multipliers. 
The number of extra constraints grows linearly with the 
number of inputs (since we have one new condition for every 
input), thus leading to smaller optimization problems. 

V. Implementation Details 
A. Initializing Vi{x) and p{ti) 

Obtaining an initial guess for Vi{x) and p{ti) is an im- 
portant part of Algorithm [T] In [24], the authors use the 
Lyapunov function candidate associated with a time- varying 
LQR controller. The control law is obtained by solving a 
Riccati differential equation: 

-S{t) = Q- S{t)B{t)R-^B^S{t) + S{t)A{t) ^A{tfS{t) 

with final value conditions S{t) = Sf. Here A{t) and B{t) 
describe the time- varying linearization of the dynamics about 
the nominal trajectory xo{t). Q and R are positive-definite 
cost-matrices. The function: 



VI. Experimental Validation 

We validate our approach with experiments on a severely 
torque limited underactuated double pendulum ('Acrobot") 
[22]. The hardware platform, shown in Figure [T] has no 
actuation at the "shoulder" joint 6i and is driven only at 
the "elbow" joint 62. A friction drive is used to drive the 
elbow joint. While this prevents the backlash one might 
experience with gears, it imposes severe torque limitations 
on the system. This is due to the fact that torques greater 
than 5 Nm cause the friction drive to slip. Thus, in order to 
obtain consistent performance, it is very important to obey 
this input limit. Encoders in the joints report joint angles 
to the controller at 200 Hz and finite differencing and a 
standard Luenberger observer [17] are used to compute joint 
velocities. 

The prediction error minimization method in MATLAB's 
System Identification Toolbox [15] was used to identify 
parameters of the model presented in [22]. The dynamics 
were then Taylor expanded to degree 3 in order to obtain a 
polynomial vector field|^ 





LQR basin 




SOS basin 




Vguessi^^t) = {x-Xo{t)y S{t){x-Xo{t))=r S{t)x 

is our initial Lyapunov candidate. Vguessi^^^N) =x^SfX, along 
with a choice of p{tN) can be used to determine the goal set, 
Bf (Section lin]) since we have: 



Bf = {x\ rSfX<p{tf)}. 

We find that setting p{ti) to a small enough constant works 
quite well in practice. 

B. Trajectory generation 

An important step that is necessary for the success of 
the control design scheme described in this paper is the 
generation of a dynamically feasible open-loop control input 
uo{t) : [0, r] and corresponding nominal trajectory 

xo{t) : [0, r] R^. A method that has been shown to work 
well in practice and scale to high dimensions is the direct 
collocation trajectory optimization method [4]. While this 



is the approach we use for the results in Section VI other 
methods like the Rapidly Exploring Randomized Tree (RRT) 
or its asymptotically optimal version, RRT"^ can be used too 
[13], [11]. 



Fig. 2. A comparison of the guaranteed basins of attraction for the time- 
invariant LQR controller (blue) and the cubic SOS controller (red) designed 
for balancing the Acrobot in the upright position. 

We designed an open-loop motion plan for the swing-up 
task using direct collocation trajectory optimization [4] by 
constraining the initial and final states to [0,0,0,0]^ and 
[;r, 0,0,0]^ respectively. We then designed a time-invariant 
nonlinear controller (cubic in the four dimensional state 
X = [01 , 02, , ^2]^) using the method from Section|ll|for bal- 
ancing the Acrobot about the upright position ([;r, 0,0,0]^]). 
Figure [2] compares projections of the SOS verified funnels for 
the LQR (blue) and cubic (red) controllers onto the O2 — 0\ 
subspace. As the plot demonstrates, the cubic controller has 
a significantly larger guaranteed basin of attraction. Other 
projections result in a similar picture. 

A linear time-varying controller was designed using the 
approach presented in Section |lll| and |IV-A| in order to 



^Taylor expanding the dynamics is not strictly necessary since sums-of- 
squares programming can handle trigonometric as well as polynomial terms 
[19]. In practice, however, we find that the Taylor expanded dynamics lead to 
trajectories that are nearly identical to the original ones and thus we avoid 
the added overhead that comes with directly dealing with trigonometric 
terms. 



(a) 62 — Oi projection of SOS funnel (red) compared to LQR 
funnel (blue) 



(b) 61 - 61 projection of SOS funnel (red) compared to LQR 
funnel (blue) 





(c) O2 — 0\ projection of set of initial conditions, 5p(o). driven 
to goal set by SOS controller (red) compared to corresponding 
set for the LQR controller (blue) 

Fig. 3. Comparison of verified SOS 
maximize the size of the funnel along the swing-up tra- 
jectory, with the goal set given by the verified region of 
attraction for the time-invariant controller. 105 sample points 
in time, ti, were used for the verification. For both the time- 
invariant balancing controller and the time-varying swing- 
up controller, we use Lyapunov functions, V, of degree 2. 
LQR controllers were used to initialize the sums-of- squares 
programs for both controllers. 

We implement our sums-of- squares programs using the 
YALMIP toolbox [16], and use SeDuMi [23] as our semidef- 
inite optimization solver. A 4.1 GHz PC with 16 GB RAM 
and 4 cores was used for the computations. The time taken 
for Step 1 of Algorithm [T] during one iteration of the 
alternation was approximately 12 seconds. Steps 2 and 3 took 
approximately 36 and 70 seconds (per iteration) respectively. 
39 iterations of the alternation scheme were required for 
convergence, although we note that a better method for 
initializing p{ti) than the one presented in Section V-A is 
likely to decrease this number. 



(d) 6\ — 6\ projection of set of initial conditions, 5p(o). driven 
to goal set by SOS controller (red) compared to corresponding 
set for the LQR controller (blue) 

funnels for the SOS and LQR controllers 

the LQR controller, we simply solve the SOS program in 



Section [III|w/^/z(9w^ searching for the controller (the approach 
taken in [25]). As the plots show, the funnels for the SOS 
controller are significantly bigger than the LQR funnels. In 
fact, by solving a simple SOS program, we found that the 
verified set of initial conditions, ^p(o). driven to the goal 
set by the LQR controller is strictly contained within the 
corresponding set for the SOS controller. Projections of the 



two sets are depicted in Figures 3(c) and 3(d) 



We validate the funnel for the controller obtained from 
SOS with 30 experimental trials of the Acrobot swinging up 
and balancing. The robot is started off from random initial 
conditions drawn from within the SOS verified funnel and 
the time- varying SOS controller is applied for the duration of 
the trajectory. At the end of the trajectory, the robot switches 
to the cubic time-invariant balancing controller. Figures |4(a) 



4(c) provide plots of this experimental validation. Plots 



4(a) 



Figures [3(a)| and [3(b)] compare projections (onto different 
subspaces of the full 4-d state space) of the funnels obtained 
from sums-of-squares for both the SOS controller and for 
the time-varying LQR controller. To obtain the funnel for 



and |4(b)| show the 30 trajectories superimposed on the funnel 
projected onto different subspaces of the 4-dimensional state 
space. Note that remaining inside the projected funnel is a 
necessary but not sufficient condition for remaining within 



the funnel in the full state space. Plot 4(c) shows the value 
of y{x^t) achieved during the different experimental trials 



(a) 62 — 61 projection of experimental trajectories superimposed 
on funnel. 



(b) 61 — 61 projection of experimental trajectories superimposed 
on funnel. 







(c) V{x,t) for 30 experimental trials 



(d) V(x,t) for 100 simulated trials 



Fig. 4. Results from experimental trials on Acrobot. 



(V{x,t) < p implies that the trajectory is inside the funnel at 
that time). The plot demonstrates that for most of the duration 
of the trajectory, the experimental trials lie within the verified 
funnel. However, violations are observed towards the end. 
This can be attributed to state estimation errors and model 
inaccuracies (particularly in capturing the slippage cause by 
the friction drive between the two links) and also to the fact 
that the Lyapunov function has a large gradient with respect 
to X towards the end. Thus, even though the trajectories 
deviate from the nominal trajectory only slightly in Euclidean 



distance (as plots 4(a) and 4(b) demonstrate), these deviations 
are enough to cause a large change in the value of y(x,^). 
We note that all 30 experimental trials resulted in the robot 



substantial improvements in the size of the SOS verified 
funnels over LQR, the use of higher degree controllers 
may provide even better results. Also, one can use higher 
degree Lyapunov functions in order to get tighter estimates 
of the true regions of attraction and funnels. This requires 
no modification to the approach presented in Section |ll| 
and III Several straightforward extensions to the framework 
presented in this paper are also possible. These are discussed 
below. 

A. Robustness 



successfully swinging up and balancing. Figure |4(d)| plots 
y(x,r) for 100 simulated experiments of the system started 
off from random initial conditions inside the funnel. All 
trajectories remain inside the funnel, suggesting that the 



violations observed in Figure [4(c)] are in fact due to modeling 
and state estimation errors. Section IVII-AI discusses how the 
method presented in this paper can be extended to deal with 
model inaccuracies and state estimation error. 

VII. Discussion 

While we found that a cubic time-invariant controller and 
a linear time-varying controller for the swing-up task gave 



As observed in Section VI modeling and state estimation 
errors can cause the guarantees given by our control design 
technique to be violated in practice. While the violations 
are small in the hardware experiments presented in Section 
[Vlj this may not be the case in application domains where 
the dynamics are difficult or impossible to model accurately 
(e.g. collision dynamics of walking robots, UAV subjected to 
wind gusts). In such scenarios, we must explicitly account 
for the uncertainty in the dynamics (and possibly in state 
estimates that the controller has access to). The control 
design framework presented in this paper allows us to do 
this. Given a polynomial system, x = f{x^u^w), where w G 
W is a bounded disturbance/uncertainty term that enters 



polynomially into the dynamics, the following condition is 
sufficient for checking invariance of the uncertain system: 

V{x,t) = p{t) =^ V{x,t,w) < p{t)yw e W. (17) 

Here, W must be a semi-algebraic set. The SOS programs 
in Sections [Ill| and |IV| can be modified (via the addition of 
multiplier terms) to check this condition (similar to [18], 
which computes funnels for systems subjected to distur- 
bances and uncertainty for a fixed controller). While one 
cannot guarantee in general that we can compute a SOS 
funnel for the uncertain dynamics, we are more likely to 
obtain funnels when we search for the controller. Also, the 
size of the guaranteed funnels (and indeed the magnitude of 
the allowable disturbances) will be larger when we search 
for the controller. 

B. Obstacles and Kinematic Constraints 

Obstacles and other kinematic constraints (such as joint 
limits) can also be incorporated into the control design 
procedure. Given a polytopic obstacle defined by half-plane 
constraints AjX > for j = 1 , . . . , M, we can impose the 
following condition: 

AjX > 0, Vj =^ V{x,t) > p(t). 

This ensures that the computed funnel does not intersect the 
obstacle (since inside the obstacle, we have V {x^t) > p{t)). 

VIII. Conclusions 

We have presented an approach for designing time-varying 
controllers that explicitly optimize the size of the guaranteed 
"funnel", i.e., the set of initial conditions driven to the 
goal set. Our method uses Lyapunov's stability theory and 
sums-of- squares programming in order to guarantee that all 
trajectories that start off inside the funnel are driven to 
the goal state, and is able to handle time-varying dynamics 
and constraints on the inputs. We demonstrate our approach 
on the "swing-up and balance" task on a severely torque- 
limited underactuated double pendulum (Acrobot). To our 
knowledge, our hardware experiments constitute the first 
experimental validation of funnels computed using sums-of- 
squares programming. The size of the guaranteed funnels we 
obtain on the Acrobot by searching for a controller show a 
significant improvement over funnels computed when one 
fixes the controller to a time-varying LQR controller. In 
practice, this means that a space-filling algorithm like LQR- 
Trees [24] or an online planning algorithm such as the one 
presented in [18] can fill the space with significantly fewer 
trajectories. Our basic approach can also be extended to deal 
with uncertainty in the dynamics, state estimation error and 
kinematic constraints (e.g. obstacles). 
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